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Exact and Approximate Expressions for the 
Probability of Undetected Error of 
Varshamov-Tenengol'ts Codes 

I. Introduction 

The Z-channel is a memoryless binary channel. For this channel, a 1 can be changed to a 
with some probability p (called the channel error probability), but a is not changed. This 
channel is a useful model for a number of applications, like semiconductor memories, some kinds 
of optical systems, and other practical environments (examples can be found in [1, Chapter 7] 
and [2]). In [3], it was demonstrated that the Z-channel is the only binary-input binary-output 
channel that is never dropped in optimal probability loading for parallel binary channels with a 
total probability constraint. For a survey of classical results on codes for the Z-channel, see [4]. 

Several constructions can be adopted for designing codes over the Z-channel with given length 
and error correction capability, and bounds on their size can be derived, based on each specific 
construction [5]. For single error correcting codes, that are of interest in this paper, further 
bounds can be found in [6]. 

We consider a well known class of single error correcting codes for the Z-channel, that is 
the class of Varshamov-Tenengol'ts (VT) codes [7]. We describe these codes and some of their 
properties. Let F2 = {0, 1} denote the binary field, and let be the additive group of 

integers modulo n + 1. For each g E ^n+i^ the VT code Vg of length n is the set of vectors 
X = (xi, X2, . . . , Xn) G F2 such that 

n 

''^^mXm=g (modn + 1). (1) 

m=l 

We can observe that the all-zero codeword, noted by 0, always belongs to Vq, while the all-one 
codeword, noted by 1, belongs to Vg with g = [^^J , where [x\ is the largest integer m such 
that m < X. 

Construction ([T]) can be generalized using other abelian groups of size n+1. The corresponding 
codes are known as Constantin-Rao (CR) codes. In this paper we only consider VT codes, but 
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most of our results can easily be generalized to CR codes. 

The Hamming weight of x = (a;i, X2, . . . , Xn) G F2 is ti?(x) = #{m | Xm = 1}. 

We use to denote the size of Vg and A'^\ \ • • • , A.n^ to denote its weight distribution, 
that is, A[^^ is the number of codewords in Vg of Hamming weight i. Exact formulas for the size 
and weight distribution of Vg were first determined by Mazur [8]. They were later generalized 
to the larger class of CR codes [9]. In particular, it is known that #Vo > #1^ for all g > 0. 
The codes have all size approximately 2"/(n + 1). More precisely, 

2" 2" n 



n + 1 n + 1 

Further 



d\{n+l) 

where (^{d) is the Euler's totient function. 

Taking only the main term of Q we get the approximation 



We let y = (yi, ?/2, • • • , Z/n) < X = (xi, X2, . . . , Xn) denote that Vm < Xm ior 1 < m < n. 
When X is sent, then only vectors y < x can be received, and the probability for this to 
happen is 

where e = x — y is the error vector. 

A systematic version of VT codes was studied in [10]. 

Many properties of these codes, either in systematic or non-systematic form, were explored 
in the past but, at the best of our knowledge, no attention has been paid up to now to their error 
detection properties. 

In this paper, we provide a first contribution for filling such gap. Our analysis is mainly focused 
on the VT code Vq. However, we will also give some results for codes Vg with 5* 7^ 0. Some 
comparisons will be also developed with the well known family of Hamming codes, finding 
important performance similarities, both when these codes are applied over the Z-channel and 
even over the symmetric channel. 

The paper is organized as follows. In Section HI] we introduce the probability of undetected 
error, P^e- In Section UlI] we give an exact formula for P^e, that can be explicitly computed for 
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small lengths n (up to approximately 25). Next, in Section |IV] we study good lower bounds that 
can be explicitly computed up to almost twice this length (depending on how tight we require the 
bounds to be). In Section |V] we look at the class of Hamming codes, for the sake of comparison, 
and their application is considered for both the symmetric channel and the asymmetric one; a first 
performance comparison with VT codes is done, for small lengths. In Section |Vl] we use some 
heuristic arguments to give a very good approximation that can easily be computed even for 
large lengths. In Section IVTll we use Monte Carlo methods to obtain other good approximations 
for long code lengths; this permits us also to make other comparisons with Hamming codes of 
the same length. Finally, in Section IVIIIl some remarks on future research conclude the paper. 

II. The probability of undetected error 

For a description of properties of the probability of undetected error, see [11]. In general, an 
undetected error occurs when, in presence of one or more errors, the received sequence coincides 
with a codeword different from the transmitted one. In this case the decoder accepts the received 
sequence, and information reconstruction is certainly wrong. By the VT code construction, single 
errors are always detected, so that undetected errors can appear only when the number of errors 
is greater than one. 

We note that, if x E Vg is sent and y is received, then y G if and only if e = x — y G Vq. 
This can be proved by observing that: 

n n n 

mem = ^ mxm - ^ my^ = g - g = (mod n + 1). 

m=l m=l m=l 

Hence, the undetectable errors are exactly the non-zero vectors in Vq. For j > 0, let 

8j{y.) = {e G Vo I w(e) = j, e < x}. 

For j > 0, this is the set of undetectable errors of weight j when x is transmitted. We note that 
£^o(x) = {0} (and is not an error vector). Let ej{'x) be the size of i^j(x). Note that since Vq 
does not contain any vector of weight one, we have £:i(x) = for all x. We also have £o{'x.) = 1 
for all X. 

The (average) probability of undetected error is given by 
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In deriving dH), the codewords are assumed equally probable. If we define 



xevg 

w(x)—i 



(HI) can be rewritten 



-. n i 



III. Exact evaluation of the undetected error probability 

Conceptually, the simplest way to compute the undetected error probability consists in direct 
calculation of (HI) by first determining the sets £j{x). Since both Vg and Vq have size on the 
order of 2"/(?2 + 1), the complexity is on the order of 2^"/ (n + 1)^. 

A Pue{Vo,p) 

We observe that if x = (xi, X2, . . . , a;„) G Vq, then the reversed vector 

= (Xn,X„_i, ...,Xi) e Vo, 



too, since 



n n 
m=l m=l 



= W(x)(r2 + 1) — WT-Xnt 
m=l 

= 0-0 = (modn + 1). 

This simplifies the calculations and reduces the complexity by some factor, but the order of 
magnitude of the complexity is still the same. We will elaborate on this symmetry in the next 
section. 

Another observation is that if u'(x) = i and e G ^j(x), with j < i, then y G £i^j{x). Hence, 

Af^j=Af^l^forO<j<t<n. (6) 
This again halves the complexity for g = 0. For completeness, we also observe that 

= Af^ for < z < n, (7) 
Af] = for 1 < z < n. (8) 
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For even n, further symmetry properties can be found. 

For any vector x = (xi, X2, . . . , the complementary vector x is defined by 

X = (1 - Xi, 1 - 0:2, . . . , 1 - Xn), 

that is, = 1 if = and = if = 1- Clearly, 



w X = n — w X 



and 

mXra = 2^mXm- 

m=l m=l 

This implies that if n is even and x G Vq, then x G Vq. Next, we observe that if y < x, then 
X < y. In particular, this implies the relation 

4!] = 4%,n-. for < J < z < n. (9) 
We note that this relation is not valid for odd n. 

Relations Q and ^ can be combined. For example, dH) implies that = 

n—i+j,ri—i 



Next, (l6l) implies that v4^°2j^ = A^^l^^- , etc. Repeated use of Q and dH) gives the following 



result: 

if n is even and < j < i < n, then 



AO) ^ ^(0) = /i(o) = 4(0) 

*J hi~j n—i+j,n—i n—i+j,j 

_ 4(0) _ 4(0) (.r.. 

Putting i = j in (flOl ) and combining with © we also get 

^i°J = ^f' = 4°2,. (11) 

Using these relations, for even n, the complexity of the exact calculus for g = is further 
reduced. For odd n, the same relationships are not valid. Moreover, it should be noted that, for 
odd n we have An^ = A^J^] = 0. 

We have developed a numerical program, in the C++ language, that constructs all the sets 
£j{x) for X G Vo (exploiting the symmetry properties discussed above) and, based on this, 
computes the numbers Af'j. The values of Pue{yo,p) a function of p computed this way are 
exact. 
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Examples of the results obtained are shown in Fig. [H for n = 10, 15, 20, 25. For small values 
of p, up to about 0.2, small values of n give lower probability of undetected error, whilst for 
larger values of p the behavior of codes with larger n is better. 

For p = 1 all codewords are changed to 0, and this implies: 

that is slightly different from 1 because of the presence of the all-zero codeword (that is always 
received correctly). 

B. P^,{Vg,p)forg^O 

For g ^ 0, Vg does not include the all-zero vector. As a consequence, PueiVg, 1) = and the 
curve of Puc has at least one maximum for p between and 1. As for Vq, we have developed a 
computer program that permits us to calculate exactly the undetected error probability of these 
codes, as a function of p, for not too high values of n (in such a way as to have acceptable 
processing times). 

In Fig. |2l curves of P^eiVi,p) are plotted, for some values of n. For better readability, we 
have used a linear scale instead of the logarithmic one used in Fig. [U 

These curves have been obtained for (7 = 1, but they remain practically the same for codes 
Vg, with g > I. 

A main reason why Pue{Vo,p) and Pue(^i)P) behave differently for large p is that Vq contains 
the all-zero vector. If we remove the all-zero codeword, that is consider 

= Vo \ {0} 

instead, we get: 

PUVo,p) = Puo{Vo,p) - ^^^^ — 

and code VJ has no undetectable errors for p = l.lt turns out that Pue(VJ)',p) and Pue(Vi,p) are 
almost the same for p < 0.5 but they differ somewhat in the region [0.5, 1]. We illustrate this 
behavior for n = 20, in Fig. |3l 
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IV. Lower bounds on P^e 

Clearly, if we omit or reduce some of the terms in ([5]), we get a lower bound on P^eiVg,p). 
For example, for a fixed integer m > 2, then 

n min(m,i) 
^^9 j=2 

The number of errors of weight j is upper bounded by A^J'^ ^ (^) . Hence, the complexity 
of calculating the coefficients A^/] for ? < m is on the order of 

(n + l)2- 

For small values of m, this is of course much lower than the computations needed to determine 
all the A[^j which we estimated to be on the order of 2^"/(n + 1)^. 

Next, we describe in detail how to calculate Afj for j = 2, j = 3 and j = 4. First, we remind 
the reader that the support of a vector e is the set of positions where the vector has ones, 
that is 

X(e) = {r \er = 1}. 

. Calculus of A^^i 
If e G Vq' has weight 2, and 

X(e) = {r,s}, 

where 1 < r < s < n, then, by the definition of the code, we must have 

r + s = n + 1. 

Hence s = n+l—r > r+1 and so r < n/2. Therefore, e G £^2(x) if and only if Xr = Xn+i-r = 1- 
Hence 

Ln/2J 

£2(x) = ^ ] X^Xn+l-T-, 
r=l 

and 

Ln/2J 
xeVp' r-=l 

Tij{x)— i 

. Calculus of Af} 



If e e Vq has weight 3, and 

X(e) = {r,s,t} 
where l<r<s<t<n, then we must have 

r + s + t^n + lorr + s + t^2(n + l). 
We observe that if e < x then, clearly, < x'' (the reversed vectors). Further, 

X(ef) ^ {n + l-t,n + l- s,n + l-r} 

and 

{n + 1 - t) + {n + 1 - s) + {n + 1 - r) ^ 3{n + 1) - {r + s + t). 

Hence, for each error with support that sums to n + 1, there is another (reversed) error with 
support that sums to 2(n + 1). Therefore, it is sufficient to consider the first kind, this way 
deriving a contribution that is exactly half of A^^^. If r + s + t — n+1, then we have 

n + l = r + s + t>r + {r + l) + {r + 2) 

and so r < (n — 2)/3. Further, 

n + l- r = s + t>s+{s + l) 
and so s < (n — r)/2. Hence, similarly to what we did for j — 2, we get 

L(n-2)/3j L("-r)/2j 



xgVg' r=l s=r+l 



• Calculus of ^4^- 4 



|(0) 

If e e V^' has weight 4, and 



X(e) = {r,s,t,u} 



where l<r<s<t<ti<n, then one of the following conditions should be satisfied: 

i) r + s + t + u — n + 1, 

ii) r + s + t + u^2(n + l), 

iii) r + s + t + u^3(n + l). 



However, we observe that when vector e satisfies condition i) then the reversed vector e'^ satisfies 
condition iii) and vice versa. In fact: 

(n + 1 - m) + (n + 1 - t) + (n + 1 - s) + (n + 1 - r) = 
= 4(n + 1) - (r + s + i + m) = 3(n + 1). 

Hence, for each error with support that sums to n + 1, there is another (reversed) error with 
support that sums to 3(n+ 1). Therefore, it is sufficient to consider condition i) and then double 
the size so found for taking into account also condition iii). lfr + s + t + u = n+l, we have 

n + l^r + s + t + u>r+{r + l) + {r + 2) + {r + 3)^Ar + 6 

and so r < (n — 5)/4. On the other hand, 

n + l-r = s + t + u>s + {s + l) + {s + 2) = 3s + 3 

and so s < (n — 2 — r)/3. Finally 

n+l-r-s^t + u>t+{t + l) = 2t + l 

and so t < (n — r — s)/2. 

Similarly, we can consider condition ii). It implies: 

2n + 2^r + s + t + u>Ar + 6 
and so r < (2n — 4)/4. Further 

2n + 2-r^s + t + u>3s + 3 
and so s < {2n — 1 — r)/3. Finally 

2n + 2-r-s = t + u>2t + l 

and so f < (2n + 1 - r - s) /2. 
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On the basis of such analysis, the expression of Afl can be written as follows: 

L(n-5)/4j L{n-2-r)/3j 

<i = 2 E E E 

xGV(^ r=l s=r+l 

■u;(x) — i 

[{n-r-s)/2\ 

^ ^ -^r-^s-^t-^n+l— r— s— t 
t=s+l 

L(2n-4)/4j [(2n-l-r)/3j 

+ E E E 

u;(x)=i 

[(2n+l-r-s)/2j 

^ ^ XrXgXfX2n+2—i — s—t- 
t=max(s+l,n+2— ) — s) 

It should be noted that, in the inner sum of the second contribution (the one due to condition 
ii)), we have explicitly taken into account that t cannot be smaller than n + 2 — r — s; this is 
because the following obvious condition must be satisfied 

u^2{n+l) -{r + s + t) <n 

that implies 

n + 2 — r — s<t. 

Additionally, the sums appearing in the expressions of A\^j are null when the upper extreme 
is smaller than the lower extreme. So, the first contribution in A\^1 is not present for n < 8, and 
also the second contribution disappears (as obvious) for n < 3. 

Though the procedure adopted to derive Aj-°2' ^fs ^^'^ "^fl quite clear and, in principle, 
can be extended to the other values of j, it is easy to see that, formally, the analysis becomes 
more and more tedious for increasing j. Similarly, explicit formulas can be given for aI^^ for 
g 0, but they are usually somewhat more complicated. The formula for j — 2 generalizes 
immediately to 

[n/2\ 

However, for j = 3 we used above the symmetry that only appears in Vq, and so the formula 
for A- 3^ will contain two sums in the expression. Similarly for j > 4. 
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For Vq we have computed some lower bounds for n = 10, 15, 20 and 25 to see how good the 
bounds are, in comparison with the exact values. In the lower bound we have used 2, ^j,3, 
and A,; 4 for A < i < n computed by the formulas above, j_2, Ai,i-3, v4j j_4 which have the 
same values by and finally j obtained from ([7]). The remaining terms have been set to 
zero. The lower bounds and the exact values are compared in Fig. IH 

From the figure we see that the lower bound is an excellent approximation of the true behavior 
for n = 10 (the exact curve and the bound are superposed), that the approximation is very good 
for n = 15, but that the difference between the exact curve and the estimated one becomes more 
and more evident for increasing n. Qualitatively, such a trend seems quite obvious and expected. 
In particular, the lower bound for = 25 exhibits an oscillation, in the central region, which is 
due to the terms neglected, whose effect is particularly important in the neighborhood of p = 0.5. 
On the other hand, it is easy to verify that the approximation is very good, independently of n, 
for small values of the channel error probability p. Even the simple bound using only Afl gives 
a good approximation for small p. 

V. Comparison with Hamming codes and relationship with the symmetric 

CHANNEL 

Hamming codes are another well known class of single error correcting codes, widely used 
both in symmetric and asymmetric channels. In particular, they are known to be optimal error 
detecting codes for the the binary symmetric channel (BSC) [12]. 

The length of a binary Hamming code H is n = 2'' — 1, where r is the number of parity check 
bits, while the number of codewords (i.e., the size of the code) is M = 2'"', with A; = 2'' — 1 — r. 
For a description of Hamming codes and their properties see, for example, [13]. 

The dual codes of Hamming codes are maximal length (or simplex) codes, which means that 
the generator matrix of a Hamming code can be used as the parity check matrix of a maximal 
length code, and vice versa. 

The weight distribution of these codes is known: 

^„ (;)+„(-i)rv^i(C.-w^) 

Here \x\ denotes the smallest integer m such that m > x. 
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When the code is applied over the BSC, this permits to find an explicit expression for the 
probability of undetected error [11, p. 44], namely: 

P^,'''iH,p) = -1^ [l + n(l-2p)("+^)/2] - (12) 

In this expression, p represents either the probability that a 1 is changed to a or a is changed 
to a 1. 

However, as for VT codes, an explicit expression for P^c{H,p) is not available for the case of 
the Z-channel. Similarly to what was done in Section|IlIl we have developed a numerical program, 
in C++ language, that permits to evaluate, exhaustively, all transitions yielding undetected errors. 
The procedure is conceptually similar to that described in Section [III for VT codes, and an 
expression like ([5]) still holds, as an undetected error occurs if and only if the error vector 
belongs to H. 

The curve of PuciH,p) can be compared, for a fixed n, with that of PuciVo,p). An example is 
shown in Fig. [5]for n = 15; both codes have the same number of codewords, i.e., #Vo = M = 
2048. The two curves are rather similar, but the performance of the Hamming code is slightly 
better. In Section IVIII we will do a comparison for a larger n. There we show that for n = 127 
both curves are dominated by a nearly flat region in the neighborhood of p = 0.5. The extent of 
the nearly flat region becomes wider and wider for increasing n. The rationale for the existence 
of the nearly flat region in the curve of P^c is given in the next section. 

VI. Heuristic approximations 

The lower bound discussed in the previous section neglects all the events caused by j er- 
rors where 5 < j < i — 5. As a consequence, the approximation is good for small p (and, 
symmetrically, for large p) but it becomes less and less reliable in the central region of p values. 

Another approach is to find some good approximation of Afj by some heuristic argument. 
By dll)-®, we only have to consider j in the range 2 < j < i/2. 

First, we observe that a vector e of weight j is contained in ("Zj) vectors x of weight i. Each 
such vector x is contained in some code Vg. Since there are Aj^^ vectors e G of weight j, 
we get 

\ J y g=0 
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Now (and this is the heuristic argument), we assume that the ratio between the number of 
undetectable errors of weight j in Vg and the overall number of errors of weight j (given by 
(fT3l)). starting from codewords x of weight i, is approximately equal to the ratio between the 
number of codewords of weight i in Vg and the total number of codewords of weight i. In 
particular, for Vq, this means to assume: 



(tl)-4f G) ■ 

Hence, under our assumption, we get 

This approximation can be computed using We observe that for i = n, we have equality in 



(fT4l) . that is, 



4(0) _ 4(0) 4(0) 



Even more simply, as an alternative to using Q, one can combine (fT4l) with the approximations 
for Af^ and A^^^ given by dH) and get 

'^^'^j^^+w{^-j)Q^¥+w{i)Q ^^^^ 

for 2 < j < i — 2 while, using ([7]), we get 



Finally, in the case of even n, using (fTT)) . we get 



^i°:«^f"V (17) 



(n+l)\j 

The heuristic argument is justified by a number of simulation evidences. Just as an example, 
in Fig. [6] we show the comparison between the exact values of Af'^ and those derived from the 
heuristic approximation, as a function of i, for n = 20 and some values of j, namely j = i 
(i.e., using (fT6l) for the heuristic approximation), and j = 2, 3, 4 ((i.e., using (flSl) and (fTTI) for 
the heuristic approximation). The heuristic values have been interpolated by continuous lines for 
the sake of readability. The figure shows that the agreement between the approximated values 
and the exact ones is very good. Though referred to a particular case, this conclusion is quite 
general, and we have verified it also for the other values of j and for different n (for example. 
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n = 25). From a theoretical point of view, the heuristic argument can be seen as an instance of 
the "random coding" approach, that has been also used recently, over the Z-channel, to extend 
the concept of Maximum Likelihood decoding [14]. As the practical significance of random 
coding increases with the size of the code, we can foresee that the goodness of the heuristic 
argument is confirmed for larger values of n. 

In practice, the best approach for moderate n is to determine some Af'j explicitly, as outlined 
above, e.g. for j < 4, combined with ^ and (Q, and to use one of the approximations in (fT4l) 
and (fT5l) for the remaining Afj. For n = 25 we have done this, with exact values for Afj and 
^fi-j for < j < 4, and with the approximation in (fT4l) for the remaining Af'j. For this case, 
the exact values and the approximations are very close, and if we draw both in a graph it is not 
possible to distinguish between them. The maximal percentage difference between the curves is 
less than 0.065%; the maximum occurs in the neighborhood of p = 0.5. 

For our heuristic approximation given in terms of the binomial coefficients, we can find a 
closed formula. The analytical details are given in Appendix I; using the approximations in (fTSi) 
and (fT6l) we get the following expression: 



{n + l)'WoPL{Vo,p) 
= 2" - (2 - p)" - np{2 - p)""-^ 

+2np{l + pf^ - 2np - n{n - 1)^1 (18) 

It is easy to see that, for n sufficiently large and except for p close to zero or one, at the right 
side, the first term is much larger than the others. So, taking into account that (n + l)^#Vo ~ 
{n+ 1)2", we have P^^{Vo,p) ^ This statement can be made more precise. We also see 
that 

Pt{Vo,0) = = PUVo,0) 

and 



15 



Now, let us consider the derivative of P^^{Vo,p). With simple algebra, we get 



+ 



n{2 - pY'' - n{n - l)p{2 - p) 



2n 



n-2 



2n{l + p)""^ + 2n(n - + 
—2n{n — l)p 

= 2n{l + np) l)p(2-p)""2 
+2np. 

In particular, we see that -^P^^{Vo,p) > for all p G (0, 1); hence P^^{Vo,p) is increasing 
with p. Moreover, it is possible to show that P^^iVo, p) exhibits a nearly flat region on the interval 
1 — ^ . This can be proved by considering that, for large n, the following approximations 
hold (see Appendix II for demonstration): 



Pt I M), 



2" 



2" 



Vn 1 

ne 2 8 



{n + 1)^ #K) 

By using the approximation #Vo — 2"/ [n + 1) we can obtain: 



1 

1 + 2ne 2 8 



Therefore 



2" / ; — \ _v^_i 272 1 

\2n + ^Jn)e 2 8~ -e 2 s. 



(n + l)^#K) 



Pi ( ^, 1 



n + 1 







'n J \ ^ 

for n ^ 00. Combined with the fact that P^g(Vo,p) is increasing, this confirms the existence of 



a nearly flat region on the interval 
and (fT8l) gives 



^ 1 - ^ 



. For example, if n = 509, then l/i/n ~ 0.044, 



Pt{Vo, 1/v^) ~ 0.001961, Pt{Vo, 1 - l/Vn) ~ 0.001972. 
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It is interesting to observe that the existence of a nearly flat region for the probability of 
undetected error can be also proved, in general terms, for any linear (or even non linear) code 
over the BSC. Demonstration is given in Appendix EI. 

For Hamming codes, in particular, the existence of a nearly flat region in the function P^^^(i7, p), 
given by (fT2l) . on the interval 1 — , can be proved through similar arguments as those 
used above for the VT codes. In this case, the derivative of the probability of undetected error 
can be expressed as follows [11, p. 44]: 



dp 



n 



n 



[(l-p)"-^-(l-2j9)("-^)/2] 

[(l_p)2]("-l)/2_(l_2p)("-l)/2 



Since (1 - p)^ > (1 - 2p), it follows that 

^5ia^>0forallpe(0.1). 

dp 

and so P^^'^{H,p) is increasing with p. If we consider the values of P^^'^(H,p) for p = 
and p = 1 — we can prove that, for large n, the following approximations hold: 

pBsc(^ 1 ^ l+ne-v^-^l-v^) 



n J n + 1 

1 \ 1 + ne~^~^ 



my n + 1 

Demonstration is given in Appendix II. It follows that 



pBSC ( ^ 1 _ 1 ^ pBSC ( 1 



ue \ 5 r— I ue \ ' /~r 

nj \ 



Therefore 



e-v^-i (1 - 1 + yi) = 
ra+ 1 ^ ' n+ 1 



ue 1 ) r— I ue , . 



for n ^ oo. Combined with the fact that P^^'~^(H,p) is increasing, this confirms the existence 

also in this case. In such region, 

1 



^ 1 _ A. 



of a nearly flat region on the interval 

P^e'''iH,p) 



n+1 
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This is the same approximate value determined above for P^^{Vo,p). However, it is possible to 
verify that, for a fixed n, the extent of the region where P^^^(i/, p) is almost constant is larger 
than that where P^^{Vo,p) is almost constant. 

Because of the lack of an explicit formula, it is not possible to demonstrate analytically that 
the same nearly flat region appears also when the Hamming code is applied over the Z-channel. 
However, the simulations described in the next section indicate that this is the case. So, assuming 
this, we can say that, even keeping in mind the different meaning of p over the symmetric and 
the asymmetric channels, the curves of the probability of undetected error for VT codes and 
Hamming codes of the same length over the Z-channel and those for Hamming codes over 
the BSC are almost constant, and practically superposed, in a wide region of the channel error 
probability. 

VII. Performance simulation 

In the previous section we have shown that the heuristic approach provides a very good 
approximation for the case of small code lengths. Testing reliability of the heuristic approximation 
for large lengths, through a comparison with the exact results, is impossible, as the exhaustive 
analysis becomes too complex just for n > 30. For large lengths, however, it can be useful to 
resort to a Monte Carlo like method, that is, to develop a simulator. The simulator replicates 
the behavior of a "real" system, and gives an estimate of the unknown probability as the ratio 
between the number of undetected errors and the number of simulated codewords. 

A rule must be established to construct the code from the information sequence. The simplest 
way to convert an information frame into a codeword consists in applying a systematic encoding. 
Systematic VT codes have been studied in [10]. As reminded in Section|Vl in a systematic code, 
every codeword consists of a k bits information vector and an r bits parity check vector. In [10], 
a systematic encoding procedure for VT codes of length n and r = [log2(n + 1)] was given. 
This is, basically, the same that is obtained with conventional Hamming codes (see Section [VT) . 

The systematic encoding procedure described in [10] is very simple: the k = n — [log2(r2+ 1)] 
information bits are set in the positions: 

/ = {1, n} \ {2^ : J = 0, 1, \log,{n + 1)] - l} . 

I defines a maximal standard information set for the VT code, i.e., it ensures the value of k 
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is maximum. The remaining positions are occupied by the parity check bits, whose values are 
determined in such a way as to satisfy ([T]). 

In general, the codewords of the systematic code, for a given value of n, are a subset of 
those obtainable through the solution of ©■ On the other hand, it is evident that any codeword 
of Vo can be a codeword of the systematic code: in practice, many information sequences can 
be encoded into more than one codeword of Vq. As an example, for n = 10, the information 
sequence (011001) can be equivalently encoded into (1000110001) or into (0001110101). When 
using the systematic code, one option should be chosen, when necessary, in order to define the 
codewords uniquely. For our simulation purposes, however, the goal is to generate the codewords 
of Vo according to a uniform distribution. To this purpose, we do not adopt any selection rule; 
on the contrary, when an information sequence is randomly generated for transmission over the 
Z-channel, all its possible encodings are considered. This way, simulation, that for high values of 
n necessarily corresponds to sampling a subset of Vq, does not exhibit any "polarization effect" 
and the simulated scenario strictly resembles that of the analytical model (and the heuristic 
argument, in particular). 

First, we have verified these conjectures by simulating the code with n = 25, that is the longest 
code for which we have presented before the exact result; as shown in Fig. Ul the simulated 
points are everywhere superposed to the exact curve. 

Then, and most important, simulation has permitted us to study much longer codes. We 
have analyzed lengths up to n = 509 (that corresponds, according with the systematic rule, to 
k = 500). In order to ensure a satisfactory statistical confidence level for the simulated Puc(Vo, p), 
each simulation has been stopped after having found 50000 undetected errors. 

The simulated curves for these long codes generally show a wide nearly flat region, for 
intermediate values of p, as expected from the heuristic analysis. Some examples of the numerical 
results obtained, confirming the above considerations, are given in Table IH For better evidence, 
in Fig. [8] we have plotted the heuristic approximation and the simulated values for n = 509. We 
see that the approximation is excellent also in this case. 

Finally, we can compare the performance of VT codes with that of Hamming codes, with the 
same code length, most of all for demonstrating the (quasi) coincidence of the nearly constant 
value. An example, for n = 127, is shown in Fig. [9l the continuous line represents P^g(Vo,p), 
while dots represent some simulated points for Pue{H,p). As expected, also the latter curve 
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exhibits a wide nearly flat region, and the value of both functions are practically the same in 
this region. Moreover, this value is also approximately equal to l/(n + 1) = 0.0078125 that, as 
proved in Section IVIIl provides the P^^'^{H,p) almost everywhere, except for values of p close 
to zero or close to one. 

VIII. Conclusion 

This paper is a first attack on the problem of evaluating the undetected error probability of 
Varshamov-Tenengol'ts codes. We have presented some methods that allow us to obtain exact 
results (for short codes) and heuristic and simulated approximate results (for long codes). We 
have shown that the proposed heuristic approximation is excellent for small n, and very good 
even for large n. 

We have verified that the probability of undetected error is almost constant in a wide region of 
values of the channel error probability, and this region becomes larger and larger for increasing 
n. Such a behavior is common to other codes, over the Z-channel, and can be found even in the 
case of a generic, linear or non-linear, code over the symmetric channel. Thus, we can conclude 
that, except for the region of a channel error probability close to zero or one, the probability 
of undetected error tends to assume the same value, approximately equal to the reciprocal of 
the code length, independently of the code and of the symmetry properties of the channel. 
Further work should be advisable to confirm these conclusions on other codes. In regard to VT 
codes, though their error detection properties seem disclosed from the analytical and numerical 
approaches proposed in this paper, it remains a valuable task to find closed form expressions for 
the quantities Af^j, or even a[j for all g, in such a way as to be able to compute the undetected 
error probability exactly for any code length. 

Appendix I: On the heuristic approximation P^^{Vo,p) 

Let us consider ([5]), by assuming g = and replacing the approximation (fT6l) for j = i and 
(fT5l) for j i; so, we get: 

WoPue{Vo,p) 
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n i—2 
i=4 j=2 

Through simple algebra, we have: 



(n + ly \i J \j 



n\ I 



= [n 



[n 



— [n 



i=2 



i=4 j=2 

n 
i 



+ i)E(:)f* + EQE(:Hi-rf-'- 



i=2 



n 



■ ]P 



+E0 [i-(i-pr-ip(i-pr' 



— ip* ^(1 — p) — 
= (n + l)[(l+p)"- 1 -np 



+ 



n \ / n 



2" - 1 - n - . , , 

(2-p)"- 1 - n(l -p) 



-np 



(2-p)-i-l-(n-l)(l-p) 

n — 1\ , 

(1-p)^ 



-n(l-p) (l+p)"-i-l-(n-l)p- (""^ 



n\ 2 \ 3 



(1 +p)" - 1 - np - I ^jp~ - )p 
2^^- (2-p)'^-np(2-p)"-^ 
+27T,p(l + p)"~^ - 2np - n{n - l)p^. 
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Hence 

= 2" - (2 - p)" - np(2 - 

+2np{l + p)"""^ - 2np - n{n - 
This is the expression given in (fTSl) in Section IVTl 

Appendix II: Approximate values of Puc for p = ^ and p = 1 - 

Let us consider, at first, the expression of P^^{Vo,p), given by (fTSl ). for the approximate 
probability of undetected error of VT codes over the Z-channel. For p = P^^{Vo,p) takes 
the value 



1 V If 1 ' 



1 / 1 
+2n^ 1 + ^ 



1 1 
-2n— -n(n-l)-. (19) 

'n n 



Considering that ^2 — -^'^ = 2" ^1 — ' adopt an approximate expression for 

such term. In fact, since < < |, the Taylor expansion 



In 1 



2^/nJ 2^ 8n 24^3/2 
can be used. This way, we obtain 



1 



nln 1- 



2 -] = 2"e 2v^J 

= 2''e"'^"^^*^*^^ 

~ 2"e-^-i. 

when n ^ CX3. 

/ \ ?i— 1 ^ / \n—l 

Similarly, we can obtain f 2 — ^ j ~ 2"e 2 5 and M + j ~ so (fT9l) can 

be rewritten as follows: 
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\fn 1 , -x/ri 1 

+2Vrie^-^ - 2Vn - n + 1. 
Considering only the leading terms, we have 

when n ^ oo. 

We can adopt the same approach in order to obtain an estimate of P^jy^^ 1 — ^). We get 



Using the approximations above, this can be rewritten as follows: 

1 



(n+l)WoPue(^^o,l 
~ 2'* - (n + 1 - V^) e^-^ 
+2 (n - ^fn) (re-"^--^ - 



Considering only the leading terms, we have 



Pl^ f K), 1 - ^ ) - f 1 + 2716-"^-^) 



when n ^ oo. 
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A quite similar approach can be applied to the probability of undetected error of Hamming 
codes, over the BSC, as given by (fT2l) . In particular, we have: 

1 



pBSC , jj 



n + l 

n + l n + 1 \ Vn J \ Jn 



As, for large n, ^1 — ^ ~ g i and ^1 — ~ e ^ 2, this implies the following 

approximation: 

1\ 1 +ne-v^-^ - (n + l)e-v^-5 



pBSC , ^ 
-1 ue 1 1 



1 + ne-^-'^ (1 - v^) 



At the point p = 1 — instead we have: 

1 \ 1 



pBSC , ^ ^ 

^ ' n+l 



n + l 



1 +n 1 



n + l 

2 \ — 



n 2 



having taken into account that (n+l)/2 is always even. Moreover, considering that ^1 — 
g-v^-i^ we have: 

1 \ 1 + ne-^-^ 



n + l 
2 



pBSC , ^ ^ 



n + l 



Appendix III: On the probability of undetected error of binary codes over the 

SYMMETRIC CHANNEL 

Let C be a binary (n,, M, d) code (it can be linear or non-linear). By [1 1, p. 44, Theorem 2.4], 

A/f " 

p^icM = -{1 + ^ A\{\ - 2vr] - {\-vr 

where Af- is the dual weight distribution (the Mac Williams transform of the weight distribution) 
of the code (for a linear code this is the weight distribution of the dual code) and d-^ the dual 
distance (that is, the least i > such that Aj- 7^ 0). 
Therefore, 

^-^^ = E '-^.^d - 2P)-' + "(1 - P)"-'- (20) 
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It is known that Aj- > (see [11, p. 16, Corollary 1.1]). Since |1 - 2p| < 1 for p G [0, 1], we 
get 

-(1 - 2py-^ < |1 - 2p\'-^ < |1 - 2p|'^^-^ 
for i > d-^. Further, it is known (see [11, p. 14, Theorem 1.4]) that 

> lAf- = n - Ai < n. 

2n-l / ^ * ^ — 

Hence, from (l20l) we get 

< n(l-p)"-i+n|l-2p|"'^-\ 

Similarly, we get 

> n(l -p)"-^ -n|l -2^1"^ 



dp 

The term n{l — p)""^ is close to zero for p removed from zero and one, for example for 

< p < 1 - l/y/n. 

The term n\l — 2p\'^^^^ is clearly small for p close to 1/2. If is of some size, it is also 
small over some range around p = 1/2. The bounds above show that '^^''"^^'P^ is also close to 
zero, and hence, 

PUC,P) ~ PnciC, 1/2) = (M - l)/2" 

over this range. 
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Fig. 2. Puc(Vi,p) vs. p for some values of n. 
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Fig. 3. Comparison of Puc{Vo,p) and Puc{Vi,p) for n — 20. 
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Fig. 4. The lower bounds and the exact results for Pue{Vo,p) for lengths n = 10, 15, 20, 25. 
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Fig. 5. Comparison of Puc(Vb,p) and Puc{H,p) for n = 15. 
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Fig. 6. Comparison between the exact values of A^°^ (dots) and those derived from the heuristic approximation (continuous 
lines) as a function of i for n = 20 and some values of j. 
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Fig. 7. Comparison between the simulated values and the exact curve of P„e{Vo,p) in the case of n = 25. 
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Fig. 8. Comparison between the simulated values of P„e{Vo,p) and the (heuristic) approximation Pue{Vo,p) for n = 509. 
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Fig. 9. Comparison between the simulated values of Puc(H,p) and the (heuristic) approximation Puc(Vb,p) for n — 127. 
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TABLE I 

Examples of simulated values of Pue(Vb,p) for some values of n and p 



V 


n = 36 


n = 67 


n 


= 127 


n 


= 247 


n 


= 509 


0.05 


0, 


.00643 


0.00742 


0. 


.00647 


0, 


.00402 


0, 


.00195 


0.1 


0, 


.01503 


0.01261 


0. 


.00776 


0, 


.00403 


0, 


.00197 


0.15 


0, 


.02088 


0.01426 


0. 


.00785 


0. 


.00404 


0. 


.00196 


0.2 


0, 


.02412 


0.01463 


0. 


.00783 


0. 


.00403 


0. 


.00196 


0.25 


0, 


.02573 


0.01469 


0. 


.00780 


0. 


.00403 


0. 


.00197 


0.3 


0, 


.02645 


0.01465 


0. 


.00780 


0. 


.00404 


0. 


.00196 


0.35 


0, 


.02677 


0.01467 


0. 


.00782 


0. 


.00403 


0. 


.00196 


0.4 


0, 


.02692 


0.01479 


0. 


.00785 


0. 


.00404 


0. 


.00196 


0.45 


0, 


.02692 


0.01474 


0. 


.00780 


0. 


.00403 


0. 


.00197 


0.5 


0, 


.02719 


0.01469 


0. 


.00780 


0. 


.00402 


0. 


.00196 


0.55 


0, 


.02729 


0.01476 


0. 


.00781 


0. 


.00400 


0. 


.00195 


0.6 


0, 


.02751 


0.01467 


0. 


.00780 


0. 


.00402 


0. 


.00196 


0.65 


0, 


.02785 


0.01468 


0. 


.00781 


0. 


.00403 


0. 


.00196 


0.7 


0, 


.02932 


0.01468 


0. 


.00780 


0. 


.00401 


0. 


.00197 


0.75 


0, 


.03402 


0.01476 


0. 


.00779 


0. 


.00403 


0. 


.00196 


0.8 


0, 


.04683 


0.01553 


0. 


.00782 


0. 


.00404 


0. 


.00195 


0.85 


0, 


.08159 


0.01962 


0. 


.00793 


0. 


.00400 


0. 


.00197 


0.9 


0, 


.17323 


0.04481 


0. 


.00928 


0. 


.00405 


0. 


.00196 


0.95 


0, 


.40865 


0.19059 


0. 


.04674 


0. 


.00591 


0. 


.00195 



